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(57) Abstract 

j n a network 
environment (200), multiple 
clients (210) and multiple 
servers (230) arc connected 
via a local area network 
(LAN) (220) to a tape backup 
apparatus (240). Each 
client (210) and each server 
is provided with backup 
agent software (215), which 
schedules backup operations 
on the basis of time since the 
last backup, the amount of 
information generated since 
die last backup, or the like. 
An agent (215a) also sends 
a request to the tape backup 
apparatus (240), prior to 
an actual backup, including 
information representative 
of the files that it intends to 
back up. The tape backup 
apparatus (240) is provided 
with a mechanism to receive 
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DATA BACKUP AND RECOVERY SYSTEMS 

^^sent invention relates to computer data backup and recovery and particularly, but not 
5 exelusively, to apparatus, methods and systems for enacting data backup and recovery. 

Backpround^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ diagram fa 

Figure 1. In Figure I, seveml client wo^^ 
10 Users perform work on the client workstations that can communicate via a network hnk with 

the servers. Clients typically store unique user data whereas servers typically provide a common 
central point for some funcfion such as shared hard disk storage, data backup storage, software 

applications or printer control. 

There are a number of current schemes used for backrng up data on the network. A first 
l5 scheme is for each client to individually store data to a dev.ee such as a tape drive. Tape storage has 
become the preferred method of backing up information on computers due to its relate Y low cost, 
high capacity, durability and portability. This first scheme requires each person, or each admm.strator, 
Sponsible for a client to be responsible for backing up the data. Also, each client rcqurres.ts own 
ba cku P dcv.ee. A second scheme is for each Cent to store .mportant data on a remote file server 
20 where the server data is backed up at regular intervals, for example on a daily basis. Thus, . a Cent 
fails, potentially only less .mportant information is lost. A third scheme, which is ava.laUe for 
example to systems which operate under Windows NT4 and are part of an NT Doma.n, , that all 
■specified' information on a client is backed up, typically overnight, to a tape dnvc connected to an 
NX server 

25 Known backup schemes offer relatively good protects for client data, but with a very high 

administration overhead. For example, if data needs to be recovered by a client, for example as a 
result of one or more files being W or destroyed, then the client's owner typically needs to contact an 
administrator of the backup system and request a restore procedure. Typically, a restore procedure 
involves the backup system administrator tracking down and mount.ng a respective tape, on wh.ch the 
30 last backup of the lost file(s) was made, and initiating the restore procedure. While such a procedure 
is typically very reliable, it can be perceived as onerous on client users and backup system 
administrators alike. 

Disclosure of the Invention 
35 m accordance with a first aspect, the present invention provides tape storage apparatus, 

comprising: 

interface means for connecting the apparatus to one or more clients; 
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controller means for controlling the apparatus and for processing messages received from the 

one or more clients; 

primary storage means; and 

tape storage means, 
5 wherein the controller is programmed: 

t0 process backup and restore messages receded from the one or more clients respect to 

from the pnmary storage means; and ^ ^ ^ rf 

to backup to the tape storage means, in accordance wun pre 

_„ 0 , n j fn res tore to the primary storage means, in 
10 the data stored in the primary storage means and to restore P ' 

Hoe wifio a respective restore messnge — *- ■ — ■ - - — ^ " * 

, 5 option - bo er^Ced using dam smred in prima* — * « - — "* 

P ""; X — — * » — - — * " ^ storage 

means ,o the tope stooge mens independently of any messages from the cltent, 

„ pmtermd embodiments, fhe primary storage composes a mndom access sta^e 

20 means such as a hard disk dri,e. Aft.man.dy. the primary storage means composes noo-yof.n o 
™Im access memo. (MV-RAM). For me ,a„cr coso. however. One appfteants hoUove that o^ontfy 
NV-RAM would be very prohibitively expensive compared to a hard dtsk dnve. 

,„ prefe^d embodiments, the tape stomge appamnts comprises a boosing configured 
specifioaffy to house the controller, the interface means, the pnmary stomge means and the tape 
25 Lgo mean, Thus, the top. storage apparatus provides a dedicated and mtegm.ed solotton ,0 data 
stomge and recover,. In other, less preferred crnbodtmcnt, the components of fine appamrus may b. 
touted, for example, with some of the components maiding on or in other ttppamtus, such n . 

"""""in accordance w,th , second ttspcot, the pmscot tnvenfion provrdes a method of backing up to 
30 a dau backup and t.stote appamtos atuched to a network data stored in one or m.re chents also 
amsched to fine nerwork, the method comprising the data backup and restore appamrus stonng m 
prima* dau stomge , most recent .cmton of all dots received from the cltents and, from ..me .. rime, 
i„ accordance with prodc.cmttnod criteria, stortng in secondary dnm stomge a. least some of the data 

stored in the primary data storage. 

In accordance with a third aspect, the present invention provides a data storage system 



35 

comprising: 

a network; 
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tape storage apparatus as claimed in any one of claims 1 to 14; and 

at least one client connected to the apparatus, the (or the at least one) client comprising client 
storage means and client processing means, the client processing means being programmed m 
accordance with predetermined criteria to determine when data stored in the client storage means 
5 should be backed up to the tape backup apparatus. 

Other aspects and embodiments of the invention will become apparent from the followmg 

description and claims. 

Brief Description of the Drawings 
10 An embodunent of the present invention will now be described in more detail, by way of 

example only, with reference to the following drawings, of which: 
Figure 1 is a diagram of a conventional computer network; 

Figure 2 is a diagram of a computer network modified to operate in accordance with the 

present exemplary embodiment; 
15 Figure 3 is a diagram illustrating the main functional features of a client accordmg to the 

present exemplary embodiment; 

Figure 4 is a diagram which illustrates the main file structures on the tape backup apparatus to 

facilitate a backup operation according to the present exemplary embodiment; 

Figure 5 is a diagram illustrating the main functional features of a backup apparatus accordmg 
20 to the present exemplary embodiment; 

Figure 6 is a flow diagram representing a backup operation according to the present exemplary 

embodiment; 

Figure 7 is a d.agram which illustrates file information gathered during the procedure 

described in relation to Figure 6; 
25 Figure 8 is a diagram which represents a redundant file elimination index; and 

Figure 9 is a block diagram of backup apparatus according to an embodiment of the present 

invention. 

Rest Mode For Carrying Out the Invention & Industrial Applicability 
30 As has already been discussed, hitherto known backup schemes and systems offer relatively 

good protection for network data, but with a potentially very high administration overhead, 
particularly when data restoration from tape is required. Administrators have the added problems of 
backups not being run, or failing due to lack of trainmg or knowledge on the part of the chent users. 
-This is especially true of backup scheduling, media handling and maintenance schemes, such as tape 
35 rotation or drive cleaning. Failed backups can result in data loss, which, at best, causes time to be lost 
in recreating the data. 
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as Ore — of networked - *~ ■ — * ^ 

, end -* « » Cbptcs or** spnce. A tape srorage «- — * °«< » ^ ' 
caoacity of 60 Gbytes <o guarantee to completely back op lh= network. 

Fmther ,«W 10-Baaa-T network « «*, a, most I Mbyte ofdau per second. A fu» 
tactalp Z wonld »ke ,0,000 seconds or ,7 hour, During ~ *-> - — 

wou ,d bo unavadablc for use for any other mean, Ms would be unable .0 roos. use*. 
,„ Tfcecmuod.mcntsdcsenbed herein aims ro address a. lass, soma of urcse.ssnes. 

Flg u« , is a dia^nr, whrab inusoo.es a ^ pnor or, networked computer en— 
,00. Msbl.apiuraH.ofaonrpntar^.orabenMO^ — 
„ a eonrpa,, nerwork ,20. in this Ota nerwork i, a LAN (loca, aren na^orh, - . .m 

Etheme, which supporfc the TCPAP data c.rnmunieouons protocoh Also connected to the LAN are a 
Ethernet,.* PP be , for eanmple, tile sewers, pnnl 

IS number of servers (designated 130a, 130b, etcj. id iltosMBd 
^ers. exnail servers, or any combination thereof. In the diagram, a tape dove ,40 s «^ 
connecte d to me seer ,30a, where the server 130a is a me serve, T* data stored on the ™ 
BOa, in on-hnc storage (where on-hne indicates <h* the stora g e is access*, at a given .stance «h 
" a hard disk dnve, is hacked up to the tape drive 140 on a daily has,, typicaUy to a new tap each 
20 da, Tape media is termed off-line storage, since when a tape is removed it is no 

b lp operaUon is scheduled and con.olled hy backup software, which runs as a ~d^css 
on the file server 130, The backup software schedules the backup to happen at, for examph, 

FlgU re 2 is a diagram, which illustrates a networked computer environment 200 modified for 
operation according to me present embodiment. As show., a polity of chents (debated Ua 
210b etc) are connected to a computer network 220. In this case also, the network ,s a LAN local 
area network), such as an Ethernet, which supports the TCP/IP data communicates protocol, 
number of seers (designated 230a, 230b, etc) are connected to the LAN 220. The servers, as a ove, 
30 may be file seers, pnnt serve,, email servers, or any combination thereof. Also connected to me 
LAN 220 ts a tape backup apparatus 240 accordtng to the present embodiment. The tape backup 
apparatus 240 is shown to include a hard dtsk drive de.ee 242 in addition to a tape dnve mechantsm 
244 The operation of the tape backup apparatus 240 will be desenbed in more detail below. 

In genera, terms/seers can be thought of as more powerful clients, or client, wUh large 
35 amounts of disk storage. Thus, for the sake of convenience and unless otherwise stated, the Krm 
••client" when used hereafter shall be taken to mean e.ther a server or a client in a networ 



900. 

25 
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environ Further, for case of description only, the term "client" shall also be *ken to inelude any 
device or apparatus which stores data locally which can be backed up remotely. 

pier, .less otinerwise indicated, only one client system 210a win be considered, although 
it will be understood that tine other clients operate in the same manner. 
5 The client 210a include, a backup agent 215a, which composes one or nwe software 

rou tiues. The mam Actional modules of the backup a g ent 215a are il.usfrated in the dia^n 
Figure 3. Each module comprises one or more software routines, written for example m the C++ 

IZ apparatus 240 as desenbed in de*il below, The software routines are stored on a^d d,k 
l0 drive dev.ee (not shown) in the client and are loaded .to man, random access memory (RAM) when 
they are required to operate, and a central processor in the client processes the instructs to confrol 
the client to operate in accordance with the present embodiment. The lines il.ustrated mterconnccting 
the vanous modules and diagram blocks represent communications channels which are open some or 
all of the time, depending on requirement. The client 210a is a general purpose computer, such as a 
1 5 PC running the Windows NT 4.0 operating system 

DYNAMIC SCHEDULER MODULE . ... , 

b Figure 3, a dynamic scheduler 310 is a module responsible for dynanncally uuuating a 
bac kup eye, from the client 210a, based on time since the last backup and the local system resources^ 
20 A local bactup configuration file 312 contains details on a gene,! network -w.de pohcy se * the 
network adminisuator and a local user-defined policy, in terms of a target tune delay before^ 
protected. For example, one default policy would be to attempt a backup once an hour, The dynamtc 
scheduler 310 is a background process which runs permanently on the chent 210a. 

After the target time delay (e.g. l-hour) has passed, the scheduler 310 assesses the local chent 
25 system 210a resources to ensure that the backup can run without seriously impacting the system 
performance. If the local client system 210a is heavily loaded, for cxampic at 95% capac.tv tf. 
scheduler 310 will retry a short period later (e.g. after 5 minutes) and continue retry.ng until tine 
system has enough free resources to run the backup. There ,s an upper time limit to the retry, for 
example 30 - rmnutcs, after which time a backup is forced irrespective of system load.ng. The upper 
30 time limit for retrying is another genera, pohcy variable stored locally in the backup configuratton file 
312 

Once the local chent system resources allow, the scheduler 310 communicates with the tape 
backup apparatus 240 to request a backup slot. The tape backup apparatus 240 will allow the backup 
job to start if the network bandwidth can support it. If there are already other backups runnmg from 
35 other clients, which are using up all of the network bandwidth allocated to backups, the tape backup 
apparatus 240 communicates with the elient 210a to refuse the request. As a result, the scheduler 310 
returns to the retry cycle and waits until the tape backup apparatus 240 g>ves permiss.on to progress. 
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Becausn baCoup «-* 2,5» dynamic* inUia.cs the hack, i»b, -he ovrf ne,wo* 

A-^Ant Thus a client can power-save, or a portable client can 
backup scheme is configuration independent. Thus, a client P 

he dined from the network, and on return the backups can continue seamlessly. Thus, there 
Xlt for the tape backup apparah* 240 to have any of which chents are 

5 connecutot are nol connected, to lh= network Many turn. 

1 «_* schedu.e, 3,0 ..so has ^ * * — ° f 

modules when required. 

ACTIVE FILE MANAGER MODULE 
„ An act,. f„e tnana 6 er ntodo.e (AFM) 320 nt.nttors which ft.es „ ,0 be .pence, by *e 

backup 1 2.5a. roc the purpose of backin 8 up. Be f oro 0 » is opened, dto AFM 3» 
In, s L sys.cn, 322 .0 « if to ft>= » already in use by another prog™ tunntng on the Chen, 
I It ,s abeady in use, the AFM 320 watts unU, Ot. fdc is tn a W «. for - 

, 2 5 ! to bank it up. On* when dtc f„« is "safe" *- *« AFM 320 nUow *. backup ^ 
15 pen Oto f,.e. A ft, can he W even if 0. f„o is ,„c k cd - in «*. » * - *-« 

1 Lup opention, the AFM 320 autontattca.iy preserves dtc da. . he backed up by senduag d« 
JL dheody to the tope backup appaaatus «0 over dto ncowork. The « 
Z" 220 can dten manage ot ,c.tder uvesc out of order backup blocks, Uans p.se^g ft. 
oririnal state of the file from when the backup started. 
20 For e.ntp.e. eonstdc, a database of — addresses (no. show,,, which . bctng ^ 

up . «» 6 the backup, a user ohaoges one of the cntrtes ■„ a par, of the database d». has 
hiked up The AFM 320 immediately rends the old addrcss ,0 the backup serve, and when the 
hack-up reaches Otis potn. ,. skips the updated address ,„ tho database. Thts — — ' - 
JL 0PP.icat.on dnnks ,. has wcttcn da. to disk, i, has tndced been to d* -J- 

25 cached somewhere e.so. Thus, dtcrc is no porsihtUPy of dau ,oss o, cona.pt.on ,f dte serve, 240 we* 

to crash during the backup. 

The AFM 320 can be user-configured to determine when a file is "safe" to backup. For 

example, the AFM 320 can use a write inactiv.ty penod to decide this, which would be one of the 

general policy values stored in the backup configuration file 312. In order to ensure that a backup 

30 copy of a file does not contain a partial transaction, the AFM 320 monitors the penod of time that 

passes without a wrue takmg place. For example, if the time is set to 5 second, the file « not safe 

until there is a 5 second penod when there are no wntes active, and at this point the file can be backed 

up There is also a value for the penod of time after which the AFM 320 g.ves up tryrng to find a safe 

state. For example, if this time is set to 60 seconds, then the AFM 320 will try for one minute to find a 

35 5 second period with no writes. 

Some applications, notably databases, operate a number of files simultaneously (e.g. data tiles 
and index files) and to assure the overall integrity of such files they must be configured as a "group' . 
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A _„ define . number of files M must be backed up from . eoUecfive W s»«. and on!, A. 
evL file in fine group is smutaneoosl, in a "safe" SU.« can each file be becked up. Thi S g^pmg . 
performed aoromaocfiy by the ATM 320 when i. deMs one of one major database rypes («.g. 
fcLge. Was, SQL Server, and Onrdc). Further, ft. AFM 120 may be configured .o near oaer- 
5 defined lis. of files as groups, with fine group definitions being srored in the backup configoaanon file 



312. 



FILE DIFFERENCING MODULE 

A file d.fferencing module (EDM) 330 is a module in the baekup agent 215a that selects the 
,0 files to be backed up by determining which fiics have changed or been adden Sl ncc the last backup^ 
The module achieves this by reading the current directory tree of the local file system 322 and 
checking each file's modified time/date agarnst the entries in a cached Directory Tree File (DTP) 332 
generated from the last backup. Modified files will have different times and dates, and new files will 
have no corresponding entry. Modified file, are marked as "Modified" and new fi.es are marked as 
1 5 "New" Note that for the first backup after installation all files will be new files. 

' Before the list of modified or new files is further processed, the list is filtered for excluded 
fi,es (such as temporary files, Internet cache files, swap files, etc). The policy for excluding files is 
he ,d in the local backup configuration file 312, and is generated from a general network pohcy set by 
the network administrator and also from a user-defined set of excluded files or d.rectones. 
20 The next stage is to determine which of the new files are already held on the tape backup 

apparatus 240, and are thus redundant. For example, if there has already been a backup of a 
Windows95 workstation, then subsequent backups will determine that the Windows95 operaung 
wstem files are redundant. The FDM 330 fust sends the list of the selected new files to the tape 
backup apparatus 240. The list contains for each file a 32-bit CRC code (ca.culated for the respective 
25 name date/time stamp and file size information). The tape backup apparatus 240 returns a hst of the 
files that match its existing backup file contents, and for each file it also returns a s.gnature (m tins 
case a 32-bit CRC checksum ca.culated over the actual file data) and an indication of the locaUon of 
the file on the backup server. For each of the potentially redundant files in the list, the FDM 330 
genemtes a respective s.gnaturc value and compares it with the value returned by the tape backup 
30 apparatus 240. Where the s.gnatures match, the file marking is changed from "New" to "Redundant . 
Thus, the output of the FDM 330 is a list of all files which arc new or modified s.ncc the last backup, 
marked as: 

"Redundant", copy already held on backup server; 
35 "New", new file, thus no need for block differencing; or 

"Modified", use block differencing lo determine which blocks changed. 
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As well as files, the FDM 330 identifies any modifications to the system information used to 
rebui ,d the system in faster recovery. Tnis covers areas such as NetWare 
partition information, file system types and details (e.g. compressed), bootstrap paeons (e.g. MBR, 
NetWare DOS partition). 

5 

BLOCK DIFFERENCING 

A block-differencing module (BDM) 340 determines which blocks in each file have changed 
since the last backup, Tne process of identifying the changed portions (deltas) of files is performed by 
two hasic processes. Tbe first process is a sliding fingerprint (SFP) process 342. In general a 

t0 fingerpnnting process is a probabilistic algorithm, where me probability of a failure ,s made ar less 
than the probability of an undetected failure of the underlying storage and commumcatron medta (for 
further detailed information on fingerprinting, the reader is referred to the book by RichardKarp and 
Michael Rabin, "Efficient randonused pattern matching algorithms". Harvard University Centre for 
Research in Computing Technology, TU-31-81, Dec 1981). The second process involves acUve 

, 5 detection of writes to regions of files; this technique requires a process called a file delta accelerator 
(FDA) process 345. The FDA process 345 is a background process which operates all tne tune to 
m0 mtor the client's operating system 312 write calls and maintain a log 347 of which logical regrons 

of which files have been modified. 

The FDA process 345 is more efficient for files that are updated in plaee (e.g. databases), 
20 while the SFP process 342 is far more efficient for document files that are entirely (or largely) 
re wntten with each update - although, only a small portion of the file may have been modified. . As 
.ill be described, the present embodiment makes use of a combination of an SFP process 342 and a 
FDA process 345. As each modified file is opened for backup, the FDA process log 347 is checked to 
see how much of the file has been modified. If more than a threshold percentage, for example 5-10 
25 percent, has been modified, and if the absolute s,ze of the changes is smaller than a given s,ze (e* 2 
MB) then the SFP process 342 is selected as the appropriate process to use. Otherw.se, the FDA- 
detected regions are used. Note that if the local Cent 210a crashes without a 'clean' FDA shutdown, 
all FDA log 347 information is totally invalidated, so the BDM 340 must temporarily revert to the SFP 
process (or a conventional incremental backup) when the next backup is performed. 
30 The SFP process 342 divides an updated file into equal-sized "chunks", the size of wluch 

varies depending on the file size. Each chunk has a 12-bytc fingerprint calculated for it, and the 
fingerprints are sent with the backup data for the file to be stored by the tape backup apparatus 240. 
When a file is to be checked with the SFP process 342, the BDM 340 communicates with the tape 
backup apparatus 240 to download the fingerprint set for the file in question. It is also possible to 
35 locally cache fingerprint sets for files that arc frequently accessed. The SFP process 342 then 
calculates the fingerprint function for the updated version of the file, starting from the first byte and 
using a chunk size the same s.zc as for the last backup of the file. Then the SFP process 342 compares 
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the resulting new first fingerpnnt with the previous fingerpnnt set to a find a match. If there » a 
.atch, then the ehunk starting at that byte is already present on the tape baekup apparatus 240, and 
thus need not be backed up. If there is no match, then the fingerprint function calculation . repeated 
but starting at the next (second) byte, and so on. 
5 For all files that are "Modified", the block differencing process is performed as described 

above, producmg a stream of modified chunks (plus new fingerprints) for each file .For "New" file, 
there is no need for block d.fferencing, so the entire file is broken up into chunks (the - ehunk 
size depends on the file size) with a new fingerpnnt being calculated for each chunk. All these file 
chunks (plus new fingerprints) are sent to a da. transfer module 350, described below in more deta* 
10 to be compressed and sent to the tape backup apparatus 240. 

DATA TRANSFER MODULE 

A data transfer module (DTM) 350 performs the actual transfer of the backup data from the 
backup agent 210a to the tape backup apparatus 240. As chunks of backup data (plus fingerpnnts) are 
15 received from the EDM 340, they are compressed and added to a backup stream of data for transfer to 
the tape backup apparatus 240. Tbere is a delay between the transfer of each chunk due to the tunes 
taken to obtain the chunk and compress it, and the need to limit the client backup transfer data rat. 
This method breaks the backup strean, into small discrete pieces, thus tnakmg it less network 
bandwidth .ntensive. The selected data transfer rate and delay is determined by the tape backup 
20 apparatus 240, as will be described. 

All the differences in all the changed files since the last backup are stored in backup d.rectory 
files (BDFs) BDFs also contain a fingerprint for each respective file chunk, and RFE mdex 
information (date/time stamps, signatures, etc) for each changed file, which will be desenbed below. 

All backup data is indexed so that it can be reconstructed from the vanousBDFs on the tape 
25 backup apparatus 240. Pointers to areas of the vanousBDFs are used for reconstruction purposes, and 
these pointers am held in the DTF, which indexes all the files on the tape backup apparatus 240. 

An exemplary DTF and associated BDFs, BDF1 and BDF2, are illustrated in Figure 4. For 
exa mP lc. with reference to F.gure 4, consider the scenario in which a file, Filel, was originally backed 
up in BDF1 400, and then the first chunk, Chunkla. was modified and was stored m BDF2 405, as 
30 Chunklb. Then, the entry in the DTF 410 has a pointer, Pointerl, to BDF2 for the first chunk, 
Chunklb, and also a pointer, Pointed, to BDF1 for the unchanged chunks, Chunk2a and Chunk3a, of 
the file. Thus, for a restore operation of File 1, File 1 comprises Chunk lb (in BDF2) and all chunks 

from Chunk 2a (in BDF1). 

For "Redundant" files, the entry in the DTF 410 is a copy of the pointers) to the already 

35 existing copy of that file on the tape backup apparatus 240. 

Every time a backup is performed, a new DTF is generated. The new DTF is sent to the tape 
backup apparatus 240 and also cached to the local client system. Since only a small number of files 
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. ■ . t K,^ir,,n ihe new DTF can use pointers to the previous DTP 

will typically have changed since the last backup, the new u 

for those parts of the directory that are unchanged. 
RESTORE MODULE 

A restore module 350. performs restore operations using the DTFs to generate a dtrectory tree 
of all files that can be restored. The restore module 330 can either use the local cached copies of the 
DTFs (for recent views) or download other ones from the tape backup apparatus 240 (for older views). 
If the restore is being performed from a mounted tape (e.g. a historical archive) in the tape backup 

10 relre view. Any restore from tape media, rather than hard disk drive device, would be much slower. 

Since the DTF generated for every delta backup is a (virtual) complete list of all files, the user 
can change the restore view to an earlier backup and restore an older copy of a file. By default, the 

initial restore tree is from the latest backup. 

When a user selects to restore a specific file from a specific backup, the DTEs are used to 
15 identify which portions of which BDF contain the file data. This data is then copied from the tape 
backup apparatus 240 to the backup agent 215a, decompressed, and wntten to the specified location m 
the client storage. 

A directory tree of all files which can be restored is generated and viewed » a graphical user 
interface, for example an extension to the Windows Explorer program available on M.croson 

20 Windows 95 and Windows NT4. litis provides a familiar and easy to use environment for users to 
restore their backup data. Thts process covers restoration from the local Cent 210a. However, tins 
does not apply to server data, particularly for NetWare, wh.ch does not have a graphs, console. In 
tins case, the server data restore tree would need to be available through a remote workstation console 
(not shown). There are two methods by which tins could be done: 

25 * if a user logs in as a backup administrator in the tape backup apparatus admm.stration 

interface, then display ALL server volumes for restore; or 

* alternatively, use the conf.gured server drive mappings to indicate which server volumes to 
chsplay in the restore directory tree. File security information stored in thcBDFs is used to filter the 
restore tree based on the user security used for each server drive mapping. 

30 

TAPE BACKUP APPARATUS 

According to the present embodiment, the tape backup apparatus 240 functionally composes a 
set of modules each cons.sting of one or more control programs. The programs may comprise 
software routines, written for example in the C++ programming language, but preferably compnse 
35 firmware stored in non-volatile memory such as read only memory (ROM), or hardware comprising 
application specific integrated circuits (ASICs). In the present embodiment, the tape backup apparatus 
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240 is in fonn of a *M — a PP l,»nc=, . wi.l be facriM beiow i„ more rfcuii w«h 
reference to Figure 9. 

The tape backup apparatus 240 comprises two-levels of data storage for backup data. The first 
,evel of data storage is on-line, random access storage, in the form of a hard disk drive device ^ of 

5 sufficient capacity potentially to at least store all data from all local client storage. From thts hard .** 
drive devnee 244, any client can restore any of its most recently backed up files or whole file systen, 
without having to address any tape storage. Hitherto, tape backup systems known to the apphcants 
rely on tape backup as the first, and typically only, level of backup. The second level of storage 
compnses off-site, off-line tapes, which are removed from the tape backup apparatus 240 by the 

10 system administrator. Data on a tape can be accessed by a client once the tape has been re loaded 
m0 unted into the tape backup apparatus 240, surce the tape can be 'mounted' as a volume of the file 
system of the tape backup apparatus 240. Of course, data recovery from tape will aiways take longer 

than data recovery from on-line storage. 

The tape backup apparatus 240 according to the present invention provides extremely 
15 convenient access by a client to recover one or more lost fi.es, which have been backed up, without the 
need to find an archtved tape. Typically, tape-based backup systems use a different tape for each day 
of the week, and thus require use of a particular tape for the day on which the data was last backed up 
to restore any data which has been lost. This means that generally tape-based backup systems must 
regularly (e.g. once a week) repeat a full backup of all data (including unchanged file that have already 
20 been backed up) to prevent needing an unmanageable number of tapes to restore a file or system. The 
present tape backup apparatus 240 maintains in on-line storage 244 an mstance of every backed up 
die* file, and any or all files can be restored at any time by a client-initiated process. Tms also means 
that there is no longer any need to repeat a backup of unchanged files - only the changes are sent after 
the first backup. 

25 The major functional modules in the tape backup apparatus 240 will now be desenbed » 

association with the functional block diagram in Figure 5. 



BACKUP DYNAMIC SCHEDULER 

A backup dynarmc scheduler 500, or backup scheduler, for the tape backup apparatus 240, 
30 works in conjuncuon with the dynamic scheduler 3 10 in the client backup agent 215a. The role of the 
backup scheduler 500 is to control the flow of backup jobs in order to control the network bandwrdth 

used by the backup data traffic. 

The backup scheduler 500 vanes the active jobs so that the amount of backup traffic over the 
network is •throttled' (or restneted), and is kept within a defined bandwidth such as 5%. Thus, the 
35 backup traffic will be guaranteed never to use more than 5% of the network bandwidth. If there are 
too many backup jobs or too much changed data then the time to complete the backup jobs will 
extend. Tims the tape backup apparatus scheduling of the backup jobs may mean that data cannot 
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„«„ be p-o.co.ed widur, .he ^ period («. 1 h.us). The parasuews and Co,, rcspecuve 
tasholds. which arc used ,o dMb. whcd» , backup opcoarion - be a.Lw* . suaced m a 
backup cordigurauon B. 504. Thcro are two basic rue-Ms daa. con be used • Uutml. she backup 

^^^ ^^^ ^ 

5 1 Each backup agent transfers the backup data at a specified controlled rate (e.g. 50KB/sec) 

by addang artificial delays between small backup data blocks. For example, a single client sending 
16K blocks with 200ms delays between blocks uses 5% of available network bandwidth for a 10Mb* 

Ethernet. m . , . 

2 Each backup agent, when it is active, bursts the backup data, aumng to complete the 

10 backup in a short time. However, the size of the backup data blocks needs to be limited (e gjo 16K) 
so that the backup does not use all available network bandwidth. A single client stream** 1 6K blocks 
uses approximately 25% of the ava.lablc network bandwidth, and two streaming clients use 45 /o etc. 
The throtthng will, then sequence the Jobs, so that only a small number (e.g. 2) are aenve 
simultaneously, and add large delays between jobs so that the overall average bandw.dth used « ,5 A. 
IS The backup scheduler 500 also includes a prioriusation scheme based on the urne jobs have 

been waiting to start, the estimated amount of data sent in each job, and the avaUable network 
'bandwidth. The prioritisation scheme variables are stored in a prionusaUon file 502 on the tape 
backup apparatus 500. For example, if a backup request from a backup agent is refused due to 
insufficient network bandwidth, the time of the first refusal is logged, and subsequent requests are 
20 compared with other outstanding requests against the length of time since the first refusal. "The job 
that has been waiting the longest will be started first. An adaptive algonthm in the backup scheduler 
500 that 'learns- the average job size from the system, by averaging over all jobs received » a fixed 
time penod of, for example, one week, can determine the esUmated size of the jobs. 

The backup scheduler 500 also adapts to the network conditions so that if the network 
25 consistently has much more than 5% available bandwidth, then the backup scheduler will sequence 
backup jobs to use more network bandwidth (e.g. 10%) during slack periods. 

m c tape backup apparatus administrator may configure the backup scheduler 500 to gtve 
prionty to the backup jobs at the expense of the network bandwidth, in which case the job sequencmg 
pnonties are assigned based on the time since the last backup rather than network bandw.dth. 

30 

REDUNDANT FILE ELIMINATION MODULE 

A redundant file elimination (REE) module 510. maintains an index, the RFE index 512, in the 
tape backup apparatus 240. The RFE index 5 1 2 is a database which is either held in memory or on the 
hard disk dove dev.ee 244, listing all the files held in the primary storage. The index 512 » used by 
35 the backup aoparatus 240 to determine whether files requested to be backed up by the local chent 215a 
are already backed up by another client and, therefore, do no. need to be backed up aga.n. The RFE 
,ndex 5 1 2 holds a file record for each file stored. Each file record, as illustrated in Figure 8, only takes 
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( lu,.,«o over the rWsizea m odif,od da* sod „me sbuPP), 8 by.es of file m M eveo 

wid, millions of fta .0 s.orc the RFE index memory re<,uiram=n.s a™ not oxc=s,vo 

5 b. baoked op. no lis. onfy *. four by.es of CRC for eooh f.le, c. CRC oo OM » 

5 bebaciceaup. / .vitrification Using the CRC information, each 

sufficient information to allow file comparison and idenbficadon. us. g 

. .u DiGMn 5 12 for a match. If any matches are found, 

file in the list is compared with entries m the RFE index 5 12 tor a m 

the tape backup apparatus 240 returns a list of diese files, plus their signatures and loc« 
10 generated for its local files to determine if the files are exactly the same. 

BACKUP STORAGE MODULE 

B eh baokop a E e„. 2.5 seods to baokop dau and .ha „, raeen. DFT ■» to rape backup 
appanros 240 tor n, b, a. baotup s,.ro g . module (BSM, 520. It. boekup dau oompnses 
15 S L of fda dau .0 bo baokop op. A bookup s.omso modu.e 520 saoras ma fdus . * - 



including: 



20 

fingerprint data) 



« Full area 524 (which holds the baseline full backup and the respective fingerpnnt data) 
♦ Delta area 526 (which holds changes since the baseline full backup and respeebve 



* Merge area 528 (which is the workspace used during merge of changes ,nto new basehne 
full backups) 

. 5 The hard d.sk drive device 244 also holds working files such as the pnontisation file 502 and 

' mc backup configuration file 504, and, where applicable, the RFE index 512. The 

backups in the full area 524 consist of a single backup data file and directory tree file for each chent 
215 and these are used as a basehne for the delta backup data. For example, if a delta is the fust block 
of a file then to restore this file the first block is obtained from the delta backup and the rest :s 
30 obtained from the baseline full backup. The full backups are initialised when the first backup is 
performed on the client system. However, if the initial full backups were then left untouched, over 
timc there would be an unmanageably large number of delta backups (which would .mpact storage 
space and restore time). Thus, there must be a regular update of the basehne full backups by mergmg 
the delta data into a new baseline full backup, as will be described below. 
35 The delta area 526 contains the delta backups from each chent. Delta backups also each 

compose a backup data file and a directory tree file. Note that the directory tree files are themselves 
deltas on the baseline full backup directory tree file. 
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The merge area 528 is used for merge operations. At pre-determined regular intervals, for 
example every month, there is a space-savrng operation to merge the oldest delta backups for each 
client into a new baseline full backup. Also, there are regular space-saving operations to merge the 
hourly delta backups for a 24-hour period into a single delta backup, as described below. 

5 

MERGE CONTROL MODULE , 

A mCT ge control module MCM 530 is responsible for merging multiple delta backups together 
with either the baselme full backup or with other delta backups. The purpose of this is to reduce tire 
amount of on-line capacity used by deltas, while maintaining a reasonable history of changes so that 

at least 10 deltas for each day (hourly backups). 

The MCM 530 can be configured by the server administrator with merge critena to smt the 
network environment. The cnteria are stored in the backup configuration file 504. For example 
keeping hourly backups would not be productive beyond one day. Therefore, one possmle default 

15 criterion is to merge the last 24 hours' deltas into one daily delta, for example at 11:59pm each day. 
Another possible scenario is for the MCM 530 to reduce the number of daily deltas, at the end of four 
wcks, by merging the oldest two weeks of deltas into a new baseline full backup. In this example, 
whenever the user requests a restore view, they have the capability to view hourly history for the 
current day and at least two weeks of daily history. 

20 If the backup storage delta area 526 reaches a P re-dctcrmined threshold, for example 95 A 

capacity, the MCM 530 overrules the merge critena and performs an rmmcdiate merge of the older 

deltas into the baseline full backup. 

Another function of the MCM 530 is to delete files from the baseline full backup when they 
have been deleted from a respective client system and are older than a predefined, user-configurable 
25 period such as one month. The MCM 530 regularly (e.g. weekly) compares the directory tree files of 
the baseline full backup with the directory tree files of the delta backups. Since d.rectory tree files 
contain a 'snap-shot' of all files present in the file system at the time, deleted files will be present m the 
full backup but not in any of the delta backups. After identifying any deleted files, if these files are 
older than a predefined period (e.g. one-month) then they are removed from the basehne full backup. 
30 In this way, the storage requirements of the baseline full backup are not dramatically increased by old 
deleted data. It is still possible to keep cop.es of old data in archives by using offsite tapes generated 

by a tape backup module. 

A further feature of the MCM 530 is to detect when multiple pomters, from different clients, 
point to a stngle baseline file entry. In this case, a standard merge operation cannot occur, to merge a 
35 delta for one client with the baseline entry, otherwise the entry would be wrong for the other chent(s). 
The means to overcome this, applied by the MCM, is not to merge the baseline entry with any deltas, 
unless all the deltas are the same. One alternative would be to carry out the merge for one client but 
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modjf y *c dehas for » — *— *— — * — * " CW "7 7" ^ 

„ to cUem( S ) - *- *■ d « te ** bascli " e °" ry f ° r *" r " s, - rae " , " ,nCd 

TAPE BACKUP MODULE 

As airecdy deseed. initWly ... .h« selcetod backup data is sen. from .he backup W« ™ 

restore from « However, Oris noes no. provide . c„ m p,e,e backup »,.««, - * 
hsekup dau is sun susccpnbic ro a disastor suacc i, n»y s,,., be on rhe « s,re as 
fc La in accordance wto rire presen. en^mcn, a res* copy of .be tope baekup paa^ 
„ da u is ™ae to — s,ora E e (to .bis case cap* to Uaa. , can be token oTsito. Tbe bae^to 
Bpc step is ntodc on *e bss.s of prcde OT ed crircria independent of any «-» reee,ved h. 

clients or elsewhere. , 

A tope backup moduie (IBM, 540 providers eapabi.iry. Tbe TBM 540 eop.es ,be top. 
backup appanrnrs-, on-Une dato on the hard dask drive device 242 to tope ntodia ,, a pae-detoantotod 
l5 ..»e - day*,.. The TBM 540 eop.es , in,a g e of ova block, of * « 

I head disk drive decree 244 to tope. Such btock-ieve, c.py.ng aUows Urge b.ocka of dato to be -* 
nnecuy fron, toe disk, totoea .ban acaduag each f..« one a, a .one torough 0,e r„c systonu Th» 
irapr .ves Oae da. ra.e uon, disk to ,apc and nans ensuaes toe. Cne tope is kep. c«rston„y »pph«. 

(streaming) with data. ... 

Tne tape backup option necessan.y runs while the tape backup apparatus 240 . sUH 
a cUvely able to accept and manage backup data from the clients. For this reason, the TBM 540 
incorporates active file manager technology, which is desenbed above, to prevent backup corrupUon. 

The baekup admnustrator can schedule the generation of an offsite backup tape by specfymg 
fce day/date and the time of day they want the tape. This configuration 
25 backup configurate file 504. In other words, the admuustrator can configure the TBM 5^ 
produce a complete tape copy of all backup data at a convenient time, for example, to take the tape 
with them when they leave work at 5.30 P m each weekday. The TBM 540 calculates the start tune of a 
tape backup based on the administrator's completion time, the amount of data held on the backup 
server, and data transfer rate of the tape backup apparatus. 

Tne default offsite tape backup schedule is every weekday at 5.30pm. If a tape >s loaded tn 
the drive at the start ume of the backup, it will automatically be overwritten with a new full backup. 
At the end of the backup, the tape is automatically ejected from the tape back-up apparatus so that ,t « 
clearly finished and ready to be taken away by the administrator. 

Since deleted files more than one month old will be removed from the baseline full baekup, 
35 the offsite tapes ean be used as archive storage for such files. By sav.ng an ofTsite tape at regular 
intervals (for example at the end of each week) the user can archive the backup data in ease any old 
deleted files are ever required in the future. Also, there may be legal requirements to keep backup data 



30 
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for . p^od of,-. There shomd be . opbon ,o enforce dais retenbon ofoffsUe nardia 

for live pulses, so du. the u*. m «■ « — * — " "T 

_* 2.5, .o «. - «*• tape fo, restoring f,.«s. The wa, dais is done is to moon, >he Up. 
, Ip opparotos image on Unc upe 242 es e d* - - ^ » * 

«*lL~y «= fries and baohop dot, fries on fte Upc To spcri op . g.nenhon of ft* 
«e, ft. ,ap= oopies of fte d ire o W nee ffles - be copied onto ft. Upe baCoap app^nu 
tad d,sh drive dev.ee. After ft. archive upe is sueeessfnliy moomed fo, restore aeeess, fte rcaorc 
«e views in Windows Explorer provides an archive restore nee, as deaonbed above. 

10 

DISASTER RECOVERY MODULE 

When , eUen, system 2.0a comply fai.a. fte Upe bacaop apparanrs 240 ran be used «, 
ieslOT e fte complete dau envn.nmen, onto a » rcp.aoemen. 0, repaired sys.cn,. A., of the fries arc 
held on fte on-lto. storage 244 of fte upe backup apparatus 240. A disaster recover modolefDRM) 
,3 550 recovers ,e q ocs t ed ffles from fte basehn. «... b.okop for fte client and any dates. T»e new 
system _ bave fte apprcapriate operatftg system insul.cd, and ften msUu ft. b*. «-> - 
Lntmieot. wtb da. up. baohop apparatus 240. Tn. rcstor. mod.,. 350 of a eUen. ,s — by an 
admmtstrc.o, to communicate with the DRM 550 and copy bach a., of ft. dau from fte Us, bachnp 
(in effect a reverse RFE). 

20 The DTFs on the tape backup apparatus 240 arc used to dcternunc the state of the system to be 

record. There are also optrons to select an older system state to restore for the drsaster recovery by 
using prevrous delta versrons of the directory tree fi.es, whrch would be used rf the latest state of the 
system were corrupted. 

Due to the fact that large quantrtrcs of the data need to be transferred over the network to 
25 recover a complete system, there is also an option to schedule the disaster recovery operauon. Smce 
the recovery is performed from the on-line storage 244 in the tape backup apparatus 240, there ,s no 
user intervention required and the recovery can proceed unattended at any time. 

BACKUP OPERATION 

30 A basic backup operation from the client 2 10a to the tape backup apparatus 240 wdl now 

desenbed with reference to the flow diagram in Figure 6, whrch splits client side and tape backup 

apparatus side operations. 

For the backuo agent 215a, the dynamrc scheduler 310 schedules a backup operauon, m step 
600 on the basis of the time lapsed since the last backup and/or the amount of new data, cheat 
35 loading and'or network load.ng. Who. the cntena for the backup operation are met, the dynamic 
scheduler 310 issues a request, in step 605, to the tape backup apparatus 240 for a backup slot. The 
tape backup apparatus 240 receives the request, and the backup scheduler 500 checks the tape backup 
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appamnas loadtng and ncwork in »=P «H - - "» — ** 

dynamic atoduta 310 mate fnrtor rcque* — . n*«* is aecepfctf „ 
Once a request is accepted by «e cape backup apparatus 240, m step 6.5 to FDM 330 
complies from all to files st.md on to client 2.0a a M lis. of files ft* have been anaended or 
S added since the client was last backed up. 

By way of e*amp.«, assume to. to Cien. 2.0a on.y stores f,ve fde, If suo.es of to *. 
e«mp,ary foes s.orcd by to Cien, 2,0a sac iVinsuated in Figwe 7, Obvaousiy, in pmcboe, to 
number of files will be much larger. 

As showt, rn Fig»e 7a, File 1 and File 2 ana modified f,,es, wbece File . bas tnoMmd 
,0 J and 7 and a respective modified dare and toe stamp, and File 2 ha, a modified block 5 and a 
LZlmadT,: date and bmcsump. ^^^^^^ 
once. The modified pontons of the fi.ee am blighted in Ore diagram ttsntg ihtck >™ 
,o to modified btoha „ stomd in tbo FDA ,.g 347. Fde 3 is an eaisfing file, wb.cb bas 
mod,fied sane, dre las, backop, and File 4 and Fi.e 5 am new fi.e, As showr, Ft.e 4 . . •»» 
,5 f„c whtch baa an cod copy a.mady on dm file server 240 for another Cent, and F„c 5 ,s a •umtp.o 
' l ick is new ,0 b„b to Cien, 2,0a and ,o to tape backup apparatus 240. A.tough ~ 
f Una new files am shown in Figure 7a, Obas is pnvc.y for case of caplamaoon hemn, and „ waUbe 
appreciated to. in pracoce One c.icn, 2,0a bas no advance infomna.ion about whc.her me .po backup 
apparatus 240 already contains versions of new files. 
» Th. first ha, altodgh no, specified shoot,, to.udes „, fi,es iuusha.cd ,n 

Fil e 3 sh.ee File 3 bos no, been modified. Having bui.t the first M. in stop 620 the FDM 330 
compiles a second G,c „s, for new f„cs, inchtding Fi,c 4 and File 5. as i.lustm.ed in F.gore 7h. As 
show*. ,bc second „s, contains for each f„e on,y a motive 4-byde CRC (Ccula.ed over to « 
date/hme snamp and fi, s,e .nforma.ion). The second hst, comprises CRCs, is then —td » 
25 to nape backop appamhrs 240 in s,ep 623. Th. 4-by.e amoon, of infotmati.n per f e m.nrm.so * 
network bandwidth required ,0 sand to mfotnoatton to cape backop appamms 240, whale a, to 
same time providing enough informabon for comparison punmses. 

The tape backup apparatus 240 recces to mguest, in s,e„ 625. and to RFE module 5,0 
compares the second file hat w„h to enhtes in to PTE index 5,2 to find tnotchtng fi,es whtch are 

30 already stored on the tape backup apparatus 240. 

An exemplary RFE index is illustrated in Figure 8. The RFE index 512 in F.gure 8 includes 
entries for three elients: Client 1 10a, CHent 1 10b and Client 1 lOn. Also shown is a representation of 
the tape backup apparatus on-line storage 242, representing the arrangement of files stored therern » 
very simple terms, for ease of understandmg only (that is, the construction of files ,n the on-line 
35 storage ^42 is not show, in terms of DTFs and BDFs). Figure 8 also shows the associate between 
each file reference in the RFE index 512 with the files stored in the on-line storage 242, although ,t 
will be appreciated that there is no physical association, such as pointers, stored by the RFE index 512. 
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In this case, File 4 is a common file (in this example, it was initially stored for Client 1 10n), 
and File I is also shown as being a common file (in this example, it was initially stored for Ghent 
UOb) There is only one entry for each eommon file in the RFE index 512, when: the entry xs 
associated with the first elient which introduced the file to the tape backup apparatus 240. 
5 Returning to the flow diagram, in step 630, the RFE module 510 compiles and returns a thud 

M of files, and respective signatures and pointers, for the files that have RFE index entries, as shown 
in Figure 7c. Thus, File 4 only is included in the third list, where File 5, being a umoue file, » 

ign0red The backup agent 215a receives the third List, in step 635, and the BDM 340 calculates a 
10 signature for each file stored on the.client 210a which appears in the list. In step 640 the calculated 
signatures are compared with the respective received signatures (in this case there is only one 
calculated and one received signature for File 4) to confirm which files are already stored on the tape 
backup apparatus 240 (i.e. which files are redundant), and thus do not need backmg up. 

Next, for each modified file (File 1 and File 2), the BDM 340 determines which spec.fic parts 
15 of the files are different, in step 645. For this operation, the BDM 340 communicates with the tape 
backup apparatus 240 to retrieve the respective fingerprint information, as illustrated by step 645a^ 

In step 648 the DTM 350 builds a fourth list, as illustrated in Figure 7d, which shows Fuel, 
File 2 File 4 and File 5. This list comprises at least some information for all new and mod.fied file,. 
The data included with each entry ur the fourth list are:- file name; modrfied date/ume sump; file 
20 d.ffcrences (for mod.fied files) or entire file data (for new and non-redundant files); signature; and 
pointer (if the file is a new, redundant file, the pointer is included to indrcate where on the tape backup 
apparatus 240 the file is already located). 

Then, in step 650, the DTM 350 transmits the fourth list, as a backup data stream to be backup 

up to the tape backup apparatus 240. 
25 Finally, in step 655, the BSM 520 receives the data and arranges and stores the data ,n the tape 

backup apparatus 240. 

The above-described process outlines the steps rcqu.rcd for a simple backup operation 
according to the present embodiment. The process can be varied or improved upon without movmg 
away from the scope or the essence of the present invention. For example, more complex RFE 
30 procedures may be applied to cope with partial file redundancy, where the tape backup apparatus 
recogmses that new files from one client are only slightly different from existing backed up files. As a 
result, only the differences between the new files and the already-stored files need to be backed up. 

The diagram in Figure 9 is a block d.agram which illustrates the components of an exemplary 
tape backup apparatus according to the present invention. 
35 In Figure 9, the tape backup apparatus is referenced 900 and includes a interface 905 for 

transmitting data between the tape backup apparatus 900 and one or more clients (not shown). The 
.nterface 905 may compose, for example, a local area network adapter, if the apparatus is configured 
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to be attached directly to a network, or a SCSI (small computer system interface) adapter, if the 
apparatus is configured to be attached directly to a computer. Titus, one or more clients can address 
the tape backup apparatus either directly across the network or via the computer. 

In the tape backup apparatus 900 a controller 910 controls the operation of all components of 
5 the backup apparatus 900, is responsible for processing messages received from clients and is 
responsible for managing the data .movement within the apparatus, for example between hard disk 
drive device 920 and tape 940. The controller communicates with the other components of the 
apparatus via a system bus 912. The controller 150 typically comprises a microprocessor, for example 
a Motorola 68000 scries microprocessor or an Intel 80386 microprocessor. The operahon of the 
,0 controller is determined by a program comprising firmware instructions stored in ROM 915. Mam 
memory 913 comprising RAM is accessible by the controller 910 via the system bus 912. 

The hard disk drive device 920 is connected to the interface 905 such that it can receive or 
send client data from or to the interface 905. In a particularly preferred embodiment, the tape backup 
apparatus 900 includes further functionality to compress data before storing it on the hard disk drive 
15 device 920, thereby reducing the storage capacity requirement thereof. Many well-known 
compression algorithms may be used, for example a Lempel-Ziv substitution algorithm. 

A read/write processor 925 is connected to the hard disk drive device 920. The read/wnte 
processor 925 is arranged to receive data from the hard disk drive device 920 and convert it into a 
form suitable for driving read/write heads 930 of a tape mechanism 935 for storage of the data to tape 
20 media 940, or to receive data from the tape media 940 and convert it into a form suitable for storage on 
the hard d.sk drive device 920. Additionally, the read/write processor 925 includes error 
eorrcction/detection functionality, which uses, for example, Reed-Solomon encoding, which is well- 
known in the data storage art. The tape media 940 is mounted in the tape mechanism 935, which loads 
and ejects the tape media 940, winds the tape med,a forwards or backwards as required and actuates 
25 the read/write heads 930 as appropriate. For example, the tape heads may be mounted on a rotating 
drum which rotates at an oblique angle to the travel of the tape, such as in a well-known DDS (Digital 
Data Storage) tape dove. Alternatively, the tape heads may be mounted for perpendicular movement 
in relation to the travel of the tape, such as in digital linear tape recording technology. 

The interface 905 and read/write processor typically each comprise one or more appropriately 
30 programmed application-specific integrated circuits (ASICs). 

The components of the tape backup apparatus in its preferred embodiment arc housed in a 
single housing (not shown) for convenience, thereby providing a dedicated data backup and restore 
apparatus and solution. The apparatus can be thought of as a novel tape drive comprising extra 
functionality and a large, non-volatile data storage facility. In practice, the non-volatile storage must 
35 have a capacity equal to or, most preferably, greater than the capacity of the combined local storage of 
all clients using the apparatus as a backup solution. Equally, the tape storage capacity of the apparatus 
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would need to ,ake into cordon on, dam compression to b being implemented. 
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CLAIMS 



1 Tape storage apparatus, comprising: 

interface means for connecting the apparatus to one or more clients; 

controller means for controlling the apparatus and for processing messages reeved from the one 



5 

or more clients; 

primary storage means; and 
tape storage means, 



wherein the controller is programmed: 

,„ to process boetop and ™» — •» - - « — *- "^T 

oookop ,0 L PH^ storage nteons to tooei.ed not, the 0,00. «d „ eootore ,o „d ebon* dou _ 

fc "7:";l, oooo,. ,0 — . - — — . - — 

to storod in OK W stooge tneauns and ,0 mm » the y stontge mean, in oocortoe W.O. . 
, 5 rc^Ov, maom moorage reeeived from o Coot, o> M som. dota omred in to »pe stomge — • 

, Appamros according » dri. I. wberen, Ok e.ntm.iee is programmed .0 marntoin stored on tho 
pri™, storage means o, loos. the most crat version oral, to received from the Cents. 

2 „ 3 Appamros accordrng ,o cithe, preceding claim. wb«ei» Ok eonfro.le, means is progrrnnmcd .o backop 
to Ld in .ho pHmary s-orage means to 0,0 tape storage moons independendy of any messages from 



the clients. 



4. Apparatus according to any one of the preceding claims, wherein the pnmary storage means comprises 
25 a random access storage means. 

5. Apparatus accordmg to any one of claims 1 to 3, wherein the pnmary storage means comprises non- 
volatile random access memory. 

30 6 Apparatus.according to any one of the preceding claims, wherein the controller is programmed by 
■nstruct.ons that are stored in non-vo.aule memory of the controller, sa.d instructions bemg read and 
processed as required by the controller. 
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7 Apparatus according to any one of the precede* claims, further comprising a housing configured 
specifically to house the controller, the interface means, the primary storage means and the tape storage 



means. 



5 8. Apparatus according to any one of ft. preceding clairos, not provided wim interface mear, for either 
or both a keyboard and visual display unit 

9 Apparatus according to any one of the preceding claims, wherein the controller is programmed for 
storing in the primary storage means baseline backup data and delta backup data. 

10 10 Apparatus according to claim 9, wherein the controller is programmed for incorporating the delta 
backup data into the basehne backup data, to form new baseline data, in accordance with pre-detemuned 



criteria. 



15 11. Apparatus according to any one of the preceding claims, wherein the controller is programmed I for 
receiving from a client a backup request message and for responding in the negative if either or both of the 
apparatus loading and the network loading exceed respective pre-determined limits. 

12 Apparatus according to any one of the precede* claims, where* the controller is programmed for 
20 receiving from a Cent a message including an indication of the particular data the client vnshes to 
backup, and for respond.ng by indicating which (if any) of the particular data aircady has a versron stored 
in the primary storage means. 

13 Apparatus according to any one of the preceding claims, wherein the controller is programmed to 
25 respond to a request message received from a client to restore to the client particular data, compnsmg 
combining respective delta and baseline data stored in the primary storage means to form the particular 
data. 

14 Apparatus according to claim 13, wherein the request message from the client requests restoration of 
30 particular data that is not the most recent version thereof backed up by the client, further compnsmg 
combinmg data stored in the tape storage means with the respective delta and baseline data to form the 
particular data. 

15. A method of backing up to a data backup and restore apparatus attached to a network data stored in 



WO 99/12098 



PCT/GB98/02603 



23 



„„o or roore clients *. *«hd b to ne^o*. to comprising ** - «"» - 

ppa^s sroriog » pri™* date srorag. . - — versto « - *- °" *- * 

1 toe ,o toe, in aeoorfance with pso^tod criteria, storurg in seoondar, data storage « least 

some of the data stored in the primary data storage. 

5 

16. A data storage system comprising: 
a network; 

tape storage apparatus as claimed in any one of claims 1 to 14; and 

at least one client connected to the apparatus, the (or the at least one) client compnsmg chent 
,0 storage means and chent processing means, the client processing means being programmed in accon!ance 
with pre-determmed cnteria to determine when data stored in the client storage means should be backed 
up to the tape backup apparatus. 

17 A system according to claim 16, whetor the cheat processing means comprises means » tarn and 
„ «* a message to to tape storage appamtos, to maaaage including a requust ,0 a backup 

operation. 

18 A system according to claim 16 or claim 17, wherein the chent processtng means composes means to 
schedule a backup operation, means to select data in the client storage means to be backed-up, and means 

20 to transmit data across the network for storage by the tape storage apparatus. 

,9. A client configured for operauon in a data storage system as claimed in any one of claims 16 to 18. 

20. Data backup and restore apparatus, comprising: 
25 interface means for connecting the apparatus to one or more clients; 

a controller; 

a primary storage means; and 

a secondary storage means, 
wherein the controller is programmed: 
30 to process backup and restore messages rccetved from the one or more clients respectively to 

backup to the primary storage means data received from the clients and to restore to said chent. data from 
the primary storage means; and 
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to backup to the tape storage means, in accordance with pre-defined criteria, at least some of the 
data stored in the primary storage means and to restore to the primary storage means, in accordance wtth a 
respective restore message received from a client, at least some data stored in the tape storage means. 
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